Crosstalk between Host Genome and Metabolome among People with HIV in South Africa

Genome-wide association studies (GWAS) of circulating metabolites have revealed the role of genetic regulation on the human metabolome. Most previous investigations focused on European ancestry, and few studies have been conducted among populations of African descent living in Africa, where the infectious disease burden is high (e.g., human immunodeficiency virus (HIV)). It is important to understand the genetic associations of the metabolome in diverse at-risk populations including people with HIV (PWH) living in Africa. After a thorough literature review, the reported significant gene–metabolite associations were tested among 490 PWH in South Africa. Linear regression was used to test associations between the candidate metabolites and genetic variants. GWAS of 154 plasma metabolites were performed to identify novel genetic associations. Among the 29 gene–metabolite associations identified in the literature, we replicated 10 in South Africans with HIV. The UGT1A cluster was associated with plasma levels of biliverdin and bilirubin; SLC16A9 and CPS1 were associated with carnitine and creatine, respectively. We also identified 22 genetic associations with metabolites using a genome-wide significance threshold (p-value < 5 × 10−8). In a GWAS of plasma metabolites in South African PWH, we replicated reported genetic associations across ancestries, and identified novel genetic associations using a metabolomics approach.


Introduction
Genome-wide associations studies (GWAS), which involve investigations on associations between genetic architecture and complex disease traits, have been widely conducted to understand how the genetic profile contributes to various diseases [1]. High-resolution Metabolites 2022, 12, 624 2 of 15 metabolomics can measure large-scale profiles of small molecules in biological samples, which enables metabolome-wide association studies (MWAS) to understand a variety of diseases on a molecular basis [2]. The traditional GWAS has been hindered by the phenotyping of exposures and risk factors that are unmeasured or even unknown. By combining high-throughput phenotyping metabolomics data with GWAS (mGWAS), we have broadened our insight into many complex diseases and traits [3].
Early mGWAS research found that common single nucleotide polymorphisms (SNP) can account for as much as 12% of the variance in homeostasis of the human metabolic profile [4]. In recent years, a number of studies of blood and urine metabolites have been conducted, revealing many robust and reproducible associations between genetic variants and metabolites, with many involving enzyme catalysis and transporter proteins [5,6]. With a goal of building an integrated atlas of genomics and metabolomics, many investigations have focused on general populations of European descent [7,8], and only a few have focused on African Americans [9,10]. For example, the UGT1A1 gene encodes the enzyme in the bilirubin metabolism, and its association with bilirubin and biliverdin has been successfully established [9,11]. This particular locus is also involved in the metabolic pathways linking to several human immunodeficiency virus (HIV) medications; however, there have been no studies focusing on individuals of African ancestry who have HIV infection living in South Africa, which would permit an exploration of environmental and genetic influences on metabolic processes. In this study of people with HIV (PWH) living in South Africa who were antiretroviral therapy (ART) naive, we aimed to assess the generalizability of previously reported findings and explore the novel associations between genetic variants and metabolites in this unique cohort of genetic ancestry, geography, and health condition.

Characteristics of Study Participants
All 490 participants were South Africans, with 305 from RK Khan Hospital and 185 from Bethesda Hospital. The mean age was 34.4 (standard deviation [SD] 10.0) years, and 312 (63.7%) were female ( Table 1). The ancestry map of the South African study participants compared to the 1000 Genome Project [12] reference panel showed a well-defined South African genetic profile partially intersecting with African (AFR) ancestry but distinct from other ancestries (Figure 1).  Table S1). The Q-Q plots and Manhattan plots are presented in Supplementary Figures S1 and S2. A total of 22 genetic associations with metabolites passed the genome-wide significance level of p < 5 × 10 −8 ( Table 4). The rs887829 variant of UGT1A1/UGT1A cluster locus was one of the top associations in the GWAS of biliverdin. Several novel loci were identified, such as the G allele of rs1401798 associated with higher hypoxanthine (0.37 ± 0.06, p = 8.80 × 10 −10 ), and the G allele of rs6874865 associated with lower 3-methyl-2-oxindole (−0.48 ± 0.08, p = 8.66 × 10 −9 ).

Candidate Gene-Metabolite Associations
From the 24 reported GWAS of blood-based metabolic profiles ( Figure 2, Table 2), we searched for the metabolites analyzed and annotated in the present study. We identified a total of 29 candidate SNP-metabolite associations, including 18 independent genetic loci and 14 metabolites from previous studies, and tested these associations in the South African cohort (Table 3). Five metabolites including citrulline, glutamine, histidine, serine, and urate, were identified from two different liquid chromatography platforms and were both analyzed, resulting in 36 replications of SNP-metabolite associations. Among the 36, a total of 24 were consistent in direction regardless of statistical significance, 7 were inconsistent, and 5 could not be compared due to lack of beta coefficients in the publications. A total of seven were both consistent in direction and statistically significant. The UGT1A1/UGT1A cluster locus showed robust association, with the T allele of the lead SNP rs887829 associated with higher bilirubin (0.31 ± 0.07, p = 3.38 × 10 −6 ) and biliverdin (0.39 ± 0.07, p = 1.04 × 10 −8 ) levels. The G allele of rs1171617 in SLC16A9 and A allele of rs7422339 in CPS1 were associated with lower carnitine (−0.23 ± 0.07, p = 0.0014) and higher creatine (0.19 ± 0.07, p = 0.0045), respectively ( Figure 3, Table 3).       Ethics approval from the Biomedical Research Ethics Committee of the University of KwaZulu-Natal, the Emory University, and Mass General Brigham institutional review boards was obtained prior to the start of the study. After signed informed consent, people with HIV (PWH) who were at least 18 years of age and qualified for anti-retroviral therapy, were enrolled into the HIV AIDS Drug Resistance Surveillance Study (ADReSS). This study was based on a sub-cohort from the ADReSS participants recruited from KwaZulu-Natal, South Africa. The recruitment was conducted in 2014-2016 at two participating clinical sites, the RK Khan Hospital and Bethesda Hospital in KwaZulu-Natal, South Africa [40]. A total of 490 participants from the study with both genomics and metabolomics data available were included in the analysis. Full summary statistics of the results presented in the study are available upon request. Individual level dataset underlying this study can be requested and shared in compliance with the informed consent and IRB approval from participating institutes.

Genotyping and Imputation
Genotyping was performed on DNA extracted from blood samples of 998 participants, using the Illumina Global Screening Array. Genotype imputation for the 998 samples was performed based on the TOPMed reference panel in the genome build GRCh38 [41], using 590,511 genotyped SNPs. A total of 8,568,848 genetic variants with imputation quality R 2 > 0.5, genotype call rate >95%, Hardy-Weinberg equilibrium p > 10 −6 , and minor

GWAS
The 154 inflation factors (lambda) of GWAS had a range of 0.98-1.02 (Supplementary Table S1). The Q-Q plots and Manhattan plots are presented in Supplementary Figures S1 and S2. A total of 22 genetic associations with metabolites passed the genome-wide significance level of p < 5 × 10 −8 ( Table 4). The rs887829 variant of UGT1A1/UGT1A cluster locus was one of the top associations in the GWAS of biliverdin. Several novel loci were identified, such as the G allele of rs1401798 associated with higher hypoxanthine (0.37 ± 0.06, p = 8.80 × 10 −10 ), and the G allele of rs6874865 associated with lower 3-methyl-2-oxindole (−0.48 ± 0.08, p = 8.66 × 10 −9 ).

Discussion
Low-and middle-income countries, such as South Africa, are facing unique public health challenges with high prevalence of infectious disease, and the growing burden of noncommunicable diseases [29]. However, most genomic and metabolic studies have been conducted on individuals of European ancestry from high-income countries. Due to different genetic ancestries and environmental exposures, the genomic and metabolomic profiles and associations with European ancestry may not apply to people with non-European ancestry from low-and middle-income countries. The present mGWAS in South Africans provides one of the first catalogs to start filling the knowledge gap in diverse global populations. Understanding the genomic and metabolic characteristics from underrepresented populations is critically needed to identify molecular mechanisms driving risks for chronic infections and noncommunicable diseases. The -omics findings will also shed light on novel precision medicine applications for at-risk populations from low-and middle-income countries to reduce health disparities.
In this mGWAS conducted among 490 PWH from Africa, we replicated previously reported genetic associations primarily from European individuals living in high-income countries. The UGT1A1/UGT1A cluster locus was associated with biliverdin and bilirubin, which are both bile pigments formed during the breakdown of hemoglobin in red blood cells. Biliverdin can be converted as unconjugated bilirubin by biliverdin reductase before being further metabolized. The UGT1A1 gene encodes the bilirubin uridine diphosphate-glucuronosyltransferase, which is the enzyme responsible for conjugation, detoxification, and clearance of bilirubin, and it has been found among European, African, and Asian populations; mutations of the gene may cause unconjugated hyperbilirubinemia [30][31][32][33][34]. Further, elevated bilirubin levels were found associated with lower risk in cardiovascular events among PWH [35], and the UGT1A1 gene is particularly of interest in future studies of HIV, since it was involved in the metabolic pathways linking to several HIV medications, such as Bictegravir, Cabotegravir, Dolutegravir, Elvitegravir, and Raltegravir (https://clinicalinfo.hiv.gov/en/guidelines/hiv-clinical-guidelines-adult-andadolescent-arv/characteristics-integrase-inhibitors, accessed on 15 June 2022). The CPS1 locus encodes a mitochondrial enzyme responsible for carbamoyl phosphate synthesis from ammonia and bicarbonate with the use of adenosine triphosphate. This is the first step of the urea cycle, which can generate arginine, a critical precursor of creatine synthesis [17]. Creatine can be both naturally synthesized or ingested in a usual diet to supply energy to the muscles, and then is converted to creatinine, filtered from the blood by the glomerulus [36]. The association between CPS1 and creatine was previously reported among participants of European descent in the Framingham Heart Study [17], and another study of European participants, which demonstrated that the locus was also associated with creatinine [37]. As another critical compound in energy metabolism, carnitine transports long-chain fatty acids into the mitochondria to be oxidized and produce energy [38,39]. The product of SLC16A9 is a carnitine efflux transporter [8]; its association with carnitine was previously established among several studies in European populations but has not been generalized to other ancestries [8,17,22]. In addition, we found a borderline statistically significant association of serine and the PHGDH gene, which encodes the enzyme involved in the process of serine synthesis [7,8].
To the best of our knowledge, this is the first population study of genome-metabolome cross-talk among individuals of African ancestry living in Africa, which represents a population facing a unique public health burden of both non-communicable diseases and chronic infectious diseases. Although the relatively moderate sample size limited us from identifying more genome-wide significant association with metabolites, our results demonstrated the value of such a mGWAS in this under-represented population. Beyond several gene-metabolite associations that are generalizable across ancestries, several novel discoveries provide candidates for replication and may benefit future research endeavors in African countries. The identified gene-metabolite associations can be used as genetic instrumental variables in the Mendelian Randomization framework to investigative potential causal relationship between metabolites and disease outcomes. There are a few limitations associated with the present study. For example, the plasma samples were not collected during fasting, and information on diet was not surveyed, which may influence the abundance of certainly diet-related metabolites. A causal pathway may not be directly inferred from these observed associations. Future studies are warranted to provide guidance on translation and clinical implementations of identified genes and metabolites among the PWH population. In addition, future studies with improved metabolomic coverage and annotation will help establish a more complete atlas of genome-metabolome relationship in this unique population. With more and larger mGWAS in African populations, improved statistical power will facilitate robust findings of gene-metabolite associations to better understand molecular mechanisms, and to investigate the functional role of less frequent genetic variants on disease risk.

Study Participants
Ethics approval from the Biomedical Research Ethics Committee of the University of KwaZulu-Natal, the Emory University, and Mass General Brigham institutional review boards was obtained prior to the start of the study. After signed informed consent, people with HIV (PWH) who were at least 18 years of age and qualified for anti-retroviral therapy, were enrolled into the HIV AIDS Drug Resistance Surveillance Study (ADReSS). This study was based on a sub-cohort from the ADReSS participants recruited from KwaZulu-Natal, South Africa. The recruitment was conducted in 2014-2016 at two participating clinical sites, the RK Khan Hospital and Bethesda Hospital in KwaZulu-Natal, South Africa [40]. A total of 490 participants from the study with both genomics and metabolomics data available were included in the analysis. Full summary statistics of the results presented in the study are available upon request. Individual level dataset underlying this study can be requested and shared in compliance with the informed consent and IRB approval from participating institutes.

Genotyping and Imputation
Genotyping was performed on DNA extracted from blood samples of 998 participants, using the Illumina Global Screening Array. Genotype imputation for the 998 samples was performed based on the TOPMed reference panel in the genome build GRCh38 [41], using 590,511 genotyped SNPs. A total of 8,568,848 genetic variants with imputation quality R 2 > 0.5, genotype call rate >95%, Hardy-Weinberg equilibrium p > 10 −6 , and minor allele frequency >0.05 entered analysis. Total of 490 participants with African ancestry also had metabolomics data from plasma samples after quality control procedures.

Metabolomic Profiling
The high-resolution metabolic profiling was performed using liquid chromatography with high-resolution mass spectrometry (Thermo Scientific Fusion, Waltham, MA, US) based on an established protocol [42][43][44]. Plasma samples collected from participants prior to antiretroviral therapy initiation were stored at −80 Celsius before thawing for analysis by liquid-chromatography mass spectrometry (Thermo Scientific Fusion, Waltham, MA, USA). Thawed plasma was treated with acetonitrile containing an internal isotopic standard mix [44,45], then centrifuged for 10 min at 4 Celsius to remove protein. The remaining supernatant was then maintained in a Dionex Ultimate 3000 autosampler at 4 Celsius until analysis. Two liquid chromatography strategies were used, including Supelco Ascentis Express HILIC 100 × 2.1 mm (53939-U) columns and Higgins C18100 × 2.1 mm (TS-1021-C185) columns with positive and negative ionization mode, respectively. All samples were analyzed in triplicate, with the standard reference samples of National Institute of Standards and Technology 1950 analyzed at the beginning and end of the analysis, and pooled plasma reference samples inserted at the beginning and end of each batch of 20 study samples [46]. Metabolic feature extraction was performed using apLCMS and xMSanalyzer [47,48]. Batch effect was corrected using Combat [49]. A metabolic feature was defined by the mass-to-charge ratio (m/z) ranging from 85 to 1275, and retention time up to 5 min. Pearson correlation coefficient ≥0.7 within triplicates of the metabolic feature intensities were retained, and the triplicates of intensities were median summarized based on the non-zero readings. Then, the zero readings of each metabolic feature were imputed as random numbers below the minimum intensity across all study samples. Annotations of metabolic features were matched to an in-house library of confirmed metabolites, which were of Level 1 identification confidence per established criteria [50,51]. The matching of annotations allowed m/z differences of 10 parts-per-million and retention time differences of 30 s. After removing metabolic features with multiple matches of annotations, a total of 154 metabolites were successfully and uniquely matched, as shown in Supplementary  Table S1.

Literature Review
The diagram of the literature review is shown in Figure 2. We searched PubMed for the key words "GWAS" and "Metabolome" and found 566 articles from 2008-2022. Among those, a total of 386 articles were human studies in the English language. We reviewed these articles and found 24 blood-based GWAS of metabolic profile in Table 2. We then identified statistically significant gene-metabolite associations to pursue replication in this present study of South Africans.

Statistical Analysis
Genetic background of the 490 study participants were analyzed coupled with the 1000 Genome Project [12] reference panel, which included African (AFR), American (AMR), East Asian (EAS), European (EUR), and South Asian (SAS). Principal components (PCs) were computed, and the ancestry map was generated based on the top two PCs that explained the most variance. For the analysis of gene-metabolite associations, the intensities of metabolites were rank-based inverse normal transformed to achieve normality. Linear regression and additive genetic models were used to test the associations between the candidate metabolites and SNPs, controlling for age, gender, study sites, and the top ten PCs, which were derived from PC analysis of the study participants. We also performed GWAS for each of the 154 metabolites identified to explore novel genetic loci, using the same statistical model. For candidate gene-metabolite association analysis, we considered p < 0.05 as statistically significant, and the genome-wide significance threshold p < 5 × 10 −8 was used for GWAS. All analyses were conducted in R version 4.0.2 (Vienna, Austria. URL Available online: https://www.R-project.org/ (accessed on 15 June 2022) and the PLINK software [52].

Conclusions
Among a cohort of PWH living in South Africa prior to ART initiation, we identified genetic associations with metabolites across ancestries as well as unique to ancestry group, which supports the importance of genomic association studies in diverse ancestries. Findings in gene-metabolite associations from diverse ancestries and geographic regions can provide insights to the disease risk at the molecular level and reduce health disparities for underrepresented populations.